Learning To See

Part 15: Information

In [1]:

%pylab inline
from supportFunctions import *
import cPickle as pickle
import time

pickleFileName = 'data/fingerDataSet' + '.pickle'
pickleFile = open(pickleFileName, 'rb')
data = pickle.load(pickleFile)
pickleFile.close()

Populating the interactive namespace from numpy and matplotlib

In [2]:

#Return to just using 3 images at training data:
trainingExampleIndices = [7, 30, 46]
trainingExamples = [data[index] for index in trainingExampleIndices]
trainX, trainY = extractExamplesFromList(trainingExamples, whichImage = 'image1bit', dist = 4)

#And two for testing:
testingExampleIndices = [40, 41]
testingExamples = [data[index] for index in testingExampleIndices]
testX, testY = extractExamplesFromList(testingExamples, whichImage = 'image1bit', dist = 4)

Let's use scikit learn's decision tree implementation¶

In [3]:

from sklearn import tree

clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 3)
clf = clf.fit(trainX, trainY)
yHat = clf.predict(testX)

from sklearn.metrics import confusion_matrix
confusion_matrix(yHat, testY)

Out[3]:

array([[4428,  103],
       [ 105,  238]])

In [4]:

from sklearn.externals.six import StringIO
from IPython.display import Image  
import pydot

dot_data = StringIO()  
tree.export_graphviz(clf, out_file=dot_data) 
graph = pydot.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png())  

Out[4]:

In [5]:

treeRule1 = lambda X: np.logical_and(np.logical_and(X[:, 40] == 1, X[:,0] == 0), X[:, 53] == 0)

ruleVector = np.zeros(81)
ruleVector[40] = 1; ruleVector[0] = -1; ruleVector[53] = -1

fig = figure(0, (6,6))
imshow(ruleVector.reshape(9,9), interpolation = 'none', cmap = rwb)

Out[5]:

<matplotlib.image.AxesImage at 0x10fc7fd10>

In [8]:

fig = figure(0, (14, 8))
testLogicalRules(trainingExampleIndices, data, fig, trainX, trainY, treeRule1)

Confusion Matrix:
[[ 330  165]
 [  79 7293]]
Recall (TPR) = 0.667 (Portion of fingers that we "caught")
Precision (PPV) = 0.807(Portion of predicted finger pixels that were actually finger pixels)
Accuracy = 0.969

In [10]:

fig = figure(0, (14, 8))
testLogicalRules(testingExampleIndices, data, fig, testX, testY, treeRule1)

Confusion Matrix:
[[ 238  103]
 [ 105 4428]]
Recall (TPR) = 0.698 (Portion of fingers that we "caught")
Precision (PPV) = 0.694(Portion of predicted finger pixels that were actually finger pixels)
Accuracy = 0.957

Just for fun, does GINI come out the same?¶

In [13]:

from sklearn import tree

clf = tree.DecisionTreeClassifier(criterion = 'gini', max_depth = 3)
clf = clf.fit(trainX, trainY)
yHat = clf.predict(testX)

from sklearn.metrics import confusion_matrix
confusion_matrix(yHat, testY)

Out[13]:

array([[4428,  103],
       [ 105,  238]])

In [14]:

from sklearn.externals.six import StringIO
from IPython.display import Image  
import pydot

dot_data = StringIO()  
tree.export_graphviz(clf, out_file=dot_data) 
graph = pydot.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png())  

Out[14]:

Minor differences, but the rule comes out the same, cool!

In [ ]: